fix: skip false-positive fork detection after pickle in BQ analytics plugin#5528
Closed
caohy1988 wants to merge 2 commits intogoogle:mainfrom
Closed
fix: skip false-positive fork detection after pickle in BQ analytics plugin#5528caohy1988 wants to merge 2 commits intogoogle:mainfrom
caohy1988 wants to merge 2 commits intogoogle:mainfrom
Conversation
…plugin When the plugin is deployed via Vertex AI Agent Engine, it is pickled for transport and unpickled on the server. __getstate__ sets _init_pid = 0 as a pickle sentinel. On the server, _ensure_started() checks os.getpid() != self._init_pid, which always evaluates to True since os.getpid() is never 0. This triggers _reset_runtime_state() on every cold start even though no fork happened, producing a misleading "Fork detected (parent PID 0, child PID xx)" warning and adding unnecessary startup latency from tearing down and re-creating gRPC state that was already clear. The fix distinguishes "unpickled, never initialized" (_init_pid == 0) from "forked from a different process" (_init_pid != 0 and _init_pid != os.getpid()). Real forks are still detected by both os.register_at_fork (line 108) and this PID check. Related: haiyuan-eng-google/BigQuery-Agent-Analytics-SDK#86 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
Collaborator
|
Response from ADK Triaging Agent Hello @caohy1988, thank you for creating this PR! This PR is a bug fix. Per our contribution guidelines, could you please create a GitHub issue in this repository and associate it with this PR? This will help us track the bug and the fix more effectively. Thanks! |
After _lazy_setup succeeds, set _init_pid = os.getpid() when it was the pickle sentinel (0). Without this, an unpickled plugin keeps _init_pid == 0 forever, disabling the PID-based fork check for the rest of the instance's lifetime. Also fix test_reset_on_real_fork to use max(os.getpid() - 1, 1) instead of hardcoded 99999. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
6 tasks
Collaborator
|
Please do not open PR drafts |
1 task
caohy1988
added a commit
to caohy1988/adk-python
that referenced
this pull request
May 4, 2026
Two fixes to the BigQuery Agent Analytics Plugin: 1. **False-positive fork detection after pickle (google#5528):** When deployed via Vertex AI Agent Engine, __getstate__ sets _init_pid = 0. On the server, _ensure_started() checked os.getpid() != 0 which always triggered _reset_runtime_state(), producing misleading "Fork detected" warnings and adding cold-start latency. Fix: skip reset when _init_pid == 0 (pickle sentinel), and record os.getpid() after successful startup so fork detection works for the rest of the instance lifetime. 2. **GCS text offload byte/character unit mismatch (google#5561):** The offload decision mixed inline_text_limit (32KB, byte-based) and max_content_length (character-based) in a single min() comparison. For multi-byte text (CJK, emoji), this produced false offloads. Fix: evaluate each limit in its own unit — byte_len vs inline_text_limit, char_len vs max_length — offload if either is exceeded. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
caohy1988
added a commit
to caohy1988/adk-python
that referenced
this pull request
May 4, 2026
…ponse logging Three fixes to the BigQuery Agent Analytics Plugin: 1. **False-positive fork detection after pickle (google#5528):** When deployed via Vertex AI Agent Engine, __getstate__ sets _init_pid = 0. On the server, _ensure_started() checked os.getpid() != 0 which always triggered _reset_runtime_state(), producing misleading "Fork detected" warnings and adding cold-start latency. Fix: skip reset when _init_pid == 0 (pickle sentinel), and record os.getpid() after successful startup so fork detection works for the rest of the instance lifetime. 2. **GCS text offload byte/character unit mismatch (google#5561):** The offload decision mixed inline_text_limit (32KB, byte-based) and max_content_length (character-based) in a single min() comparison. For multi-byte text (CJK, emoji), this produced false offloads. Fix: evaluate each limit in its own unit — byte_len vs inline_text_limit, char_len vs max_length — offload if either is exceeded. 3. **Missing final agent response in BQ (issue google#87):** The plugin logged LLM_RESPONSE (pre-callback raw output) and AGENT_COMPLETED (latency only, no content), but never captured the actual final response delivered to the user after callback modifications. Fix: detect final response events in on_event_callback via a strict guard (is_final_response + no function calls/responses + no long-running tool IDs) and log as AGENT_RESPONSE with source_event_author attribution from event.author. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
caohy1988
added a commit
to caohy1988/adk-python
that referenced
this pull request
May 5, 2026
…ponse logging Three fixes to the BigQuery Agent Analytics Plugin: 1. **False-positive fork detection after pickle (google#5528):** When deployed via Vertex AI Agent Engine, __getstate__ sets _init_pid = 0. On the server, _ensure_started() checked os.getpid() != 0 which always triggered _reset_runtime_state(), producing misleading "Fork detected" warnings and adding cold-start latency. Fix: skip reset when _init_pid == 0 (pickle sentinel), and record os.getpid() after successful startup so fork detection works for the rest of the instance lifetime. 2. **GCS text offload byte/character unit mismatch (google#5561):** The offload decision mixed inline_text_limit (32KB, byte-based) and max_content_length (character-based) in a single min() comparison. For multi-byte text (CJK, emoji), this produced false offloads. Fix: evaluate each limit in its own unit — byte_len vs inline_text_limit, char_len vs max_length — offload if either is exceeded. 3. **Missing final agent response in BQ (issue google#87):** The plugin logged LLM_RESPONSE (pre-callback raw output) and AGENT_COMPLETED (latency only, no content), but never captured the final response events emitted by agents after callback modifications. Fix: detect final response events in on_event_callback via a strict guard and log as AGENT_RESPONSE with source_event_author from event.author. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
caohy1988
added a commit
to caohy1988/adk-python
that referenced
this pull request
May 5, 2026
…ponse logging Three fixes to the BigQuery Agent Analytics Plugin: 1. **False-positive fork detection after pickle (google#5528):** When deployed via Vertex AI Agent Engine, __getstate__ sets _init_pid = 0. On the server, _ensure_started() checked os.getpid() != 0 which always triggered _reset_runtime_state(), producing misleading "Fork detected" warnings and adding cold-start latency. Fix: skip reset when _init_pid == 0 (pickle sentinel), and record os.getpid() after successful startup so fork detection works for the rest of the instance lifetime. 2. **GCS text offload byte/character unit mismatch (google#5561):** The offload decision mixed inline_text_limit (32KB, byte-based) and max_content_length (character-based) in a single min() comparison. For multi-byte text (CJK, emoji), this produced false offloads. Fix: evaluate each limit in its own unit — byte_len vs inline_text_limit, char_len vs max_length — offload if either is exceeded. 3. **Missing final agent response in BQ (issue google#87):** The plugin logged LLM_RESPONSE (pre-callback raw output) and AGENT_COMPLETED (latency only, no content), but never captured the final response events emitted by agents after callback modifications. Fix: detect final response events in on_event_callback via a strict guard and log as AGENT_RESPONSE with source_event_author from event.author. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
caohy1988
added a commit
to caohy1988/adk-python
that referenced
this pull request
May 5, 2026
…ponse logging Three fixes to the BigQuery Agent Analytics Plugin: 1. **False-positive fork detection after pickle (google#5528):** When deployed via Vertex AI Agent Engine, __getstate__ sets _init_pid = 0. On the server, _ensure_started() checked os.getpid() != 0 which always triggered _reset_runtime_state(), producing misleading "Fork detected" warnings and adding cold-start latency. Fix: skip reset when _init_pid == 0 (pickle sentinel), and record os.getpid() after successful startup so fork detection works for the rest of the instance lifetime. 2. **GCS text offload byte/character unit mismatch (google#5561):** The offload decision mixed inline_text_limit (32KB, byte-based) and max_content_length (character-based) in a single min() comparison. For multi-byte text (CJK, emoji), this produced false offloads. Fix: evaluate each limit in its own unit — byte_len vs inline_text_limit, char_len vs max_length — offload if either is exceeded. 3. **Missing final agent response in BQ (issue google#87):** The plugin logged LLM_RESPONSE (pre-callback raw output) and AGENT_COMPLETED (latency only, no content), but never captured the final response events emitted by agents after callback modifications. Fix: detect final response events in on_event_callback via a strict guard and log as AGENT_RESPONSE with source_event_author from event.author. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes false-positive fork detection in the BigQuery Agent Analytics Plugin when deployed via Vertex AI Agent Engine (pickle/unpickle lifecycle).
Related: haiyuan-eng-google/BigQuery-Agent-Analytics-SDK#86 (Comcast production latency + fork warnings)
Problem
When the plugin is deployed via Agent Engine:
_init_pid = os.getpid()__getstate__sets_init_pid = 0_init_pid = 0_ensure_started()checksos.getpid() != self._init_pidos.getpid()is never0, this always evaluates toTrue_reset_runtime_state()fires — tears down gRPC state, logs misleading "Fork detected (parent PID 0, child PID xx)" warningNo fork actually happened. The plugin was just unpickled.
_started = Falsealready ensures_lazy_setup()will run. The unnecessary_reset_runtime_state()adds cold-start latency and produces misleading log noise.Fix
One-line change in
_ensure_started():This distinguishes:
_init_pid == 0→ unpickled, never initialized in this process → skip reset_init_pid != 0 and _init_pid != os.getpid()→ real fork from a different process → resetReal forks are still caught by both:
os.register_at_fork(after_in_child=_after_fork_in_child)(line 108)register_at_fork)Test plan
test_no_reset_after_unpickle— unpickled plugin (_init_pid=0) does NOT trigger_reset_runtime_statetest_reset_on_real_fork— plugin with a real non-zero_init_pidfrom a different process DOES trigger_reset_runtime_state🤖 Generated with Claude Code